An Approach to Automation Selection of Decision Tree based on Training Data Set

نویسندگان

  • D. Saravanakumar
  • N. Ananthi
  • M. Devi
  • Amir Bar-Or
  • Daniel Keren
  • Assaf Schuster
  • Arun K Pujari
  • H. Vafaie K. DeJong
  • Carla E. Brodley
  • Paul E. Utgoff
  • Andrew B. Nobel
  • Rakesh Agrawal
  • Tomasz Imielinski
  • Arun Swami
  • Donato Malerba
  • Floriana Esposito
چکیده

In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large or small training data. Some related algorithms are best for large data sets and some for small data sets. Each algorithm works best for its own criteria. The decision tree algorithms classify categorical and continuous attributes very well but it handles efficiently only a smaller data set. It consumes more time for large datasets. Supervised Learning In Quest (SLIQ) and Scalable Parallelizable Induction of Decision Tree (SPRINT) handles very large datasets. But SLIQ requires that the class labels should be available in main memory beforehand. SPRINT is best suited for large data sets and it removes all these memory restrictions. The research work deals with the automatic selection of decision tree algorithm based on training dataset size. This proposed system first prepares the training dataset size using the mathematical measure. The result training set size problem will be checked with the available memory space. If memory is very sufficient then the tree construction will continue. After the classifying the data, the accuracy of the classifier data set

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anomaly Detection Using SVM as Classifier and Decision Tree for Optimizing Feature Vectors

Abstract- With the advancement and development of computer network technologies, the way for intruders has become smoother; therefore, to detect threats and attacks, the importance of intrusion detection systems (IDS) as one of the key elements of security is increasing. One of the challenges of intrusion detection systems is managing of the large amount of network traffic features. Removing un...

متن کامل

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

Extended MULTIMOORA method based on Shannon entropy weight for materials selection

Selection of appropriate material is a crucial step in engineering design and manufacturing process. Without a systematic technique, many useful engineering materials may be ignored for selection. The category of multiple attribute decision-making (MADM) methods is an effective set of structured techniques. Having uncomplicated assumptions and mathematics, the MULTIMOORA method as an MADM appro...

متن کامل

Fuzzy multi-criteria selection procedures in choosing data source

Technology assessment and selection has a substantial impact on organizations procedures in regards to technology transfer. Technological decisions are usually made by a group of experts, and whereby integrity of these viewpoints to a single decision can be quite complex. Today, operational databases and data warehouses exist to manage and organize data with specific features and henceforth, th...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016